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Abstract. I review the statistical techniques needed to extract information 
about physical parameters of galaxies from their observed spectra. This is 
important given the sheer size of the next generation of large galaxy redshift 
surveys. Going to the opposite extreme I review what we can learn about 
the nature of the primordial density field from observations of high-redshift 
objects. 



1. Extracting cosmological information from galaxy spectra 

Most of the information about the physical properties of galaxies comes 
from their electromagnetic spectrum. It is therefore of paramount impor- 
tance to be able to extract as much physical information as possible from 
it. In principle, it is straightforward to determine physical parameters from 
an individual galaxy spectrum. The method consists in building synthetic 
stellar population models which cover a large enough range in the parame- 
ter space and then use a merit function (typically a x^) to evaluate which 
suite of parameters better fits the observed spectrum. There are two obvi- 
ous limitations of the above method: first, the number of parameters that 
govern the spectrum of a galaxy may be very large and thus difficult to 
explore fully. Secondly, in the case of ongoing large redshifts surveys which 
will provide us with about a million galaxy spectra, it will be computa- 
tionally very expensive (and possibly intractable for redshift surveys like 
the 2dF and SDSS) to apply a plain to each individual spectrum which 
itself may contain of the order of 10^ data points. 

The non-obvious route to tackle the problem is to compress the orig- 
inal data set in order to weight more those pixel in the spectrum that 
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carry most information about a given parameter. It is worth reminding 
that non-optimal data compression is commonly applied to galaxy spec- 
tra: photometric filters. Not surprisingly, this empirical data compression is 
not optimal since it has not been devised to be lossless, i.e. contain all the 
information for a given parameter. For example, the photometric B filter 
alone is not optimal to recover the age of a galaxy. On the other hand, more 
sophisticated and non-empirical methods have been proposed for extract- 
ing information from galaxy spectra, some of them as old as the Johnson's 
filter system itself. Many of these are based on Principal Component Anal- 
ysis or wavelet decomposition (Murtagh & Heck 1987; Francis et al. 1992; 
Connolly et al. 1995; Folkes, Lahav & Maddox 1996; Galaz & deLappar- 
ent 1998; Bromley et al. 1998; Glazebrook, Offer &: Deeley 1998; Singh, 
Gulati &; Gupta 1998; Connolly &i Szalay 1999; Ronen, Aragon-Salamanca 
& Lahav 1999; Folkes et al. 1999). PCA projects galaxy spectra onto a 
small number of orthogonal components. The weighting of each component 
corresponds to it's relative importance in the spectra. However while these 
components appear to correlate well with physical properties of galaxies, 
their interpretation is difficult since they do not have known, specific phys- 
ical properties; they can be amalgams of different properties. To interpret 
these components, we have to return to model spectra and compare them 
with the components (Ronen, Aragon-Salamanca & Lahav 1999). This is a 
disadvantage of PCA since one important goal of the analysis is to study 
the evolution of the physical properties which dramatically affect galaxy 
spectra, such as the age, metallicity, star formation history or dust con- 
tent. More sophisticated methods have been recently proposed (Heavens, 
Jimenez &; Lahav 2000; Slonim et al. 2000). Here I will concentrate in de- 
scribing the optimal parameter extraction method proposed by Heavens, 
Jimenez Sz Lahav (2000). 

The main idea of the method in Heavens, Jimenez & Lahav (2000) 
is that, in practice, some of the data may tell us very little about the 
parameters we are trying to estimate, either through being very noisy, or 
through having no sensitivity to the parameters. So in principle, wc may 
be able to throw some data without loosing much information about the 
parameters. It is obvious that throwing away data is not the most optimal 
way. On the other hand, by performing linear combinations of the data we 
will do better and then we can throw the linear combinations which tell us 
least. Given a set of data x (in our case the spectrum of a galaxy) which 
includes a signal part jj. and noise n, i.e. x = ^ -|- n, the idea then is to find 
a weighting vector b such as y = b*x, it is these numbers y which we are 
after. 

In Heavens, Jimenez & Lahav (2000) an optimal and lossless method 
was found to calculate b for multiple parameters (as is the case with galaxy 
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spectra). Specifically: 



bi = 
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where a comma denotes the partial derivative with respect to the pa- 
rameter m and C is the covariance matrix with components Cij =< ninj >. 

The specific steps to build m linear combinations to estimate m param- 
eters are the following: 

1. Choose a "fiducial" model (a first guess) 

2. Compute the mean spectrum for the m parameters and m partial 
derivatives with respect to the m parameters {/J^^ra)- 

3. Now compute m eigenvectors bj from Eq.l and 2. 

4. Finally, compute the m yi values. This dataset is orthonormal, so the 
new likelihood is easy to compute (the have mean (y^) = b^^t and 
unit variance), namely: 



This procedure can be applied to derive the metallicities, ages, star 
formation rates and dust content of galaxy spectra (Reichardt, Jimenez &; 
Heavens 2000). 

It is very instructive to illustrate the method by trying to recover the 
age and normalization of single stellar populations (SSP), i.e. the star for- 
mation rate is SFR(t') = AS{t' + 1) where 5 is a Dirac delta function. The 
two parameters to determine are age t and normalization A. We built a 
simulated spectra with Gaussian noise and variance given by the mean, 
C = diag(/Li^, . . .). This is appropriate for photon number counts when the 
number is large. It should be stressed that this is a more severe test of the 
model than a typical galaxy spectrum, where the noise is likely to be dom- 
inated by sources independent of the galaxy, such as CCD read-out noise 
or sky background counts. In the latter case, the compression method will 
do even better than the example here (e.g. Reichardt, Jimenez & Heavens 
(2000)). The simulated galaxy spectrum is one of the galaxy spectra (age 
3.95 Gyr, model number 100), and the maximum signal-to-noise per bin is 
taken to be 2. Noise is added, approximately photon noise, with a Gaussian 
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Age Index 

Figure 1. Full likelihood solution using all pixels. There are 6 contours running down 
from the peak value in stops of 0.5 (in InC), and 3 outer contours at —100, —1000 and 
— 10000. The triangle in the upper-right corner marks the fiducial model which determines 
the eigenvectors to set the initial weights. 




Age Index 

Figure 2. Likelihood solution using the compressed data set, i.e. the age datum and the 
normalization datum. Contours are as in Fig. 1. 

distribution with variance equal to the number of photons in each channel. 
Hence C = diag(/Lti, /^2) • • •)• Figure 1 shows the contours in the likelihood 
surface using all the points in the spectra. Figure 2 shows the contours in 
the likelihood surface using only two linear combinations: y\ and y2- As 
it transpires from the figures, only two numbers suffice to determine two 
parameters. 
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Figure 3. The effect on the PDF for a Gaussian field $ of adding the square of itself. 
Note how the peaks get enhanced and the valleys suppressed. 

2. The abundance of high redshift objects 

Now that cosmic-microwave-background (CMB) experiments (de Bernardis 
et al. 2000; Jaffe et al. 2000) have verified the inflationary predictions of 
a flat Universe and structure formation from primordial adiabatic pertur- 
bations, we are compeUed to test further the predictions of the simplest 
single-scalar-field slow-roll inflation models and to look for possible devia- 
tions. Measurements of the distribution of primordial density perturbation 
afford such tests. The observed abundance of high-redshift objects contains 
precious information about the properties of the initial conditions. The rea- 
son for this is that the flrst objects to collapse, for a given mass, will be 
due to fluctuations in the tail of the distribution of the primordial density 
field and therefore will reflect the "strength" of it. Furthermore, high-z ob- 
jects constrain the small scale part of the spectrum of the primordial mass 
density field that cannot be probed directly by the large scale structure of 
cosmic microwave background (CMB) observations. 

The importance of using the mass-function as a tool to distinguish 
among different non-Gaussian statistics for the primordial density field, was 
first recognized by Lucchin & Matarrese (1988); Colafrancesco, Lucchin & 
Matarrese (1989) and more recently, by Chiu, Ostriker & Strauss (1998), 
followed by Robinson & Baker (1999), Robinson, Gawiser & Silk (1999a); 
Robinson, Gawiser & Silk (1999b), Koyama, Soda & Taruya (1999), Willick 
(2000), Avelino & Viana (1999). To make predictions on the number counts 
of high-redshift structures in the context of non-Gaussian initial conditions, 
a generalized version of the Press-Schechter (PS) theory has to be intro- 
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duced. The PS theory exploits the fact that in most cosmological scenarios 
the large scale power exceeds that generated by non-linear coupling. This 
in conjunction with specifying an "artificial" filtering of the initial density 
field and a threshold for which we define objects that are able to collapse, 
provides us with a description of the mass function in terms of the proba- 
bility density field (PDF) - see Peacock (1999). Thus in order to extend the 
PS theory to the non-Gaussian case one needs to compute the "smoothed" 
PDF for the non-Gaussian field. Furthermore, numerical simulations tell 
us that the PS theory provides a reasonable approximation for the num- 
ber of objects produced in tails - provided we do not consider flTictiiations 
that deviate more than 5a from the mean where serious deviations from 
the PS prediction occur (Press & Schechter 1974; Lee Sz Shandarin 1998; 
Sheth &; Tormen 1999; Jenkins et al. 2000). Obtaining analytical results 
in this context is extremely important. Direct simulations of non-Gaussian 
fields are generally plagued by the difficulty of properly accounting for the 
non-linear way in which resolution and finite box-size effects, present in any 
realization of the underlying Gaussian process, propagate into the statistical 
properties of the non-Gaussian field. Moreover, finite volume realizations 
of non-Gaussian fields might fail in producing fair samples of the assumed 
statistical distribution, i.e. ensemble and (finite-volume) spatial distribu- 
tions might sensibly differ. This problem, of course, becomes exacerbated 
and hard to keep under control in so far as the tails of the distribution 
are concerned. Thus, in looking for the likelihood of rare events for a non- 
Gaussian density field, either exact or approximate analytical estimates 
should be considered as the primary tool. 

Robinson & Baker (1999); Robinson, Gawiser & Silk (1999a); Robinson, 
Gawiser & Silk (1999b) considered a PDF which had a log-normal distribu- 
tion and assumed that it was the PDF which described the smoothed field 
of fluctuations for a wide range of non-Gaussian models (mostly those aris- 
ing from structure formation by topological defects), based on comparison 
s with numerical experiments. Their non-Gaussianity depends on a single 
parameter, G which is nothing but the ratio of 3cj peaks in a non-Gaussian 
model compared to the Gaussian case. An Einstein-dcSittcr universe pro- 
duces a noticeable deficit of high-redshift objects at high-redshift (e.g. 
Peacock et al. (1998)), RGS were able to find a region in the a^ — G plane 
for which the predicted cluster abundance in an EdS universe agrees with 
observations (but see below). 

Willick (2000) studied in great detail the mass determination of the 
high-redshift cluster MS1054-03 concluding that its mass lies in the range 
1.4 lb 0.3 X 10^^ Mq for = 0.3. He then investigated the amount of non- 
gaussianity needed to accommodate this cluster within the CDM scenario 
using a parameterization for non-gaussianity similar to that of Robinson, 
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Figure 4. Ratio R{M,z) = Nng{> M,z)/N{> M,z) for galaxies at rcdshift 2 = 8, 9 
and 10 for ea = 5 X 10^^ (non-Gaussianity in the density field, left panel) and clusters at 
redshift z = 1,2 and 3 (right panel), for ee = 200 (non-Gaussianity in the potential) as a 
function of M. Lines are plotted only for masses where, for Gaussian initial conditions, one 
would expect to observe at least one object in the whole sky. Note that these high-redshift 
objects represent 3- to 5-cr peaks. The values for the number density enhancement R that 
can safely be attributed to primordial non-Gaussianity are R = 100 for galaxies (left 
panel) and J? = 10 for clusters (right panel) 

Gawiser & Silk (1999a). He found that MS1054-03 cannot be accommo- 
dated in a CDM scenario with > 0.3 unless some non-gaussianity exists. 

Matarrese, Verde &: Jimenez (2000) have computed an analytic expres- 
sion for the probability distribution function for a parameterization of pri- 
mordial non-Gaussianity that covered a wide range of physically motivated 
models: the non-Gaussian field is given by a Gaussian field plus a term 
proportional to the square of a Gaussian ^ = (p + eB{<P'^ — (0^))) where 
$ applies to both the density perturbation field 6{x) and the primordial 
gravitational potential. They also introduced a generalized version of the 
PS approach valid in the context of non-Gaussian initial conditions. Note 
that Matarrese, Verde &: Jimenez (2000) considered only small departures 
from Gaussianity. They also showed how this tiny departures can have a 
large impact in the number density of observed objects at high-redshifts 
(see their Fig. 6). Note also that if one considers large deviations from 
non-gaussianity, then the normalization for as derived for Gaussian initial 
conditions is no longer valid. 

Verde et al. (2000b) have devised a method to constrain non-gaussianity 
by studying the size-temperature distribution of galaxy clusters. The size- 
temperature distribution is sensitive to the redshift of formation of the 
clusters. If clusters originate from rare peaks of an initially Gaussian distri- 
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bution, the spread in formation redshift should be smaU and so should be 
the scatter in the size-temperature distribution. On the other hand, if the 
initial distribution has long non-Gaussian tails, clusters we observe today 
should have a broad formation redshift interval and therefore a large scatter 
in the size-temperature distribution. They found that for the non-Gaussian 
parameters derived by Robinson, Gawiscr & Silk (1999a) to explain the 
observed abundance of high-redshift objects in an EdS universe, the spread 
in the size— temperature relation would be much larger than is currently 
observed, thus excluding the possibility that the EdS universe could be rec- 
onciled with observations of high-redshift objects through a large amount 
of non-gaussianity. It is worth noting though that this would not be the 
case for a non-gaussianity which comes from a bimodal distribution with 
one of the modes centered at the cluster scale. 

A comparison of the sensitivities for detecting non-gaussianity for sev- 
eral tests has been investigated in Verde ct al. (2000c); Verde et al. (2000a). 
Using the kind of non-gaussianity described in Matarrese, Verde &; Jimenez 
(2000) , they conclude that the CMB is superior at finding non-Gaussianity 
in the primordial gravitational potential (as inflation would produce), while 
observations of high-redshift galaxies are much better suited to find non- 
Gaussianity that resembles that expected from topological defects. Thus 
observations of high-redshift objects with the Next Generation Space Tele- 
scope and the currently proposed 30-100 m class telescopes should help us 
to shed light on the nature of the primordial density field if - and this is 
a big if - mass determinations of these objects can obtained with a 100% 
error (see Fig. 4). 

It is a pleasure to thank my collaborators in this work: Alan Heavens, 
Marc Kamionkowski, Ofer Lahav, Sabino Matarrese and Licia Verde. 
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